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Abstract 

Recently,  sparse  representation  has  been  applied  to  vi¬ 
sual  tracking  to  find  the  target  with  the  minimum  recon¬ 
struction  error  from  the  target  template  subspace.  Though 
effective,  these  LI  trackers  require  high  computational  costs 
due  to  numerous  calculations  for  ii  minimization.  In  addi¬ 
tion,  the  inherent  occlusion  insensitivity  of  the  £i  minimiza¬ 
tion  has  not  been  fully  utilized.  In  this  paper,  we  propose  an 
efficient  LI  tracker  with  minimum  error  bound  and  occlu¬ 
sion  detection  which  we  call  Bounded  Particle  Resampling 
(BPR)-LI  tracker.  First,  the  minimum  error  bound  is  quickly 
calculated  from  a  linear  least  squares  equation,  and  serves 
as  a  guide  for  particle  resampling  in  a  particle  filter  frame¬ 
work.  Without  loss  of  precision  during  resampling,  most 
insignificant  samples  are  removed  before  solving  the  com¬ 
putationally  expensive  £i  minimization  function.  The  BPR 
technique  enables  us  to  speed  up  the  LI  tracker  without  sac¬ 
rificing  accuracy.  Second,  we  perform  occlusion  detection 
by  investigating  the  trivial  coefficients  in  the  £i  minimiza¬ 
tion.  These  coefficients,  by  design,  contain  rich  information 
about  image  corruptions  including  occlusion.  Detected  oc¬ 
clusions  enhance  the  template  updates  to  effectively  reduce 
the  drifting  problem.  The  proposed  method  shows  good  per¬ 
formance  as  compared  with  several  state-of-the-art  trackers 
on  challenging  benchmark  sequences. 


1.  Introduction 

Visual  tracking  is  an  important  topic  for  applications 
such  as  security  and  surveillance,  vehicle  navigation,  hu¬ 
man  computer  interaction,  and  so  on.  The  challenges  in 
designing  a  robust  visual  tracking  algorithm  in  a  dynamic 
environment  are  caused  by  the  presence  of  noise,  occlu¬ 
sion,  varying  viewpoints,  background  clutter  and  illumina¬ 
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tion  changes.  A  thorough  review  can  be  found  in  [25]. 

Recently,  sparse  representation  [5,  8]  has  been  success¬ 
fully  applied  to  visual  tracking  [18,  15,  14].  In  these  meth¬ 
ods,  the  tracking  problem  is  formulated  as  finding  a  sparse 
representation  of  the  target  candidate  using  templates.  The 
advantage  of  using  the  sparse  representation  lies  in  the  ro¬ 
bustness  to  a  wide  range  of  image  corruptions,  especially  to 
an  occlusion.  The  results  show  good  performance,  however, 
at  a  computational  expense  of  the  £i  minimization.  Fur¬ 
thermore,  the  target  states  are  estimated  in  a  particle  filter 
framework  and  the  computational  cost  grows  proportion¬ 
ally  as  the  number  of  particle  samples  increases.  The  large 
computational  cost  prevents  the  tracker  from  being  used  in  a 
real  time  system  such  as  real  time  surveillance  and  military 
operations.  Furthermore,  the  rich  information  captured  in 
approximation  coefficients  has  not  been  utilized  for  occlu¬ 
sion  analysis.  For  example,  a  gradual  occlusion  may  cause 
drifting  in  the  template  set. 

Inspired  by  the  work  mentioned  above,  we  propose  an 
efficient  tracking  algorithm  with  minimum  error  bound  and 
occlusion  detection.  Our  first  contribution  is  to  largely  im¬ 
prove  the  run  time  efficiency  of  the  LI  tracker  by  using  an 
error  bound  derived  from  the  least  squares.  Specifically,  we 
observe  that  the  computationally  expensive  reconstruction 
error  in  the  sparsely  constrained  £i  minimization  is  lower- 
bounded  by  the  least  squares  reconstruction  error,  which 
can  be  calculated  efficiently.  The  reconstruction  error  ob¬ 
servation  motivates  us  to  design  a  Bounded  Particle  Resam¬ 
pling  (BPR)  algorithm,  which  greatly  boosts  the  speed  of 
the  tracking  algorithm  without  sacrificing  resampling  pre¬ 
cision.  Specifically,  the  probability  of  tracking  samples  is 
calculated  in  two  stages.  In  the  first  stage,  the  sample  is  re¬ 
constructed  by  simply  projecting  the  sample  onto  the  target 
template  subspace.  The  reconstruction  is  solved  through  a 
linear  least  squares  equation  that  runs  several  orders  faster 
than  a  typical  £i  minimization  function.  In  the  second  stage, 
only  dynamically  selected  samples  that  have  smaller  re¬ 
construction  errors  from  previous  stage  are  reconstructed 
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through  ii  minimization,  while  most  of  the  samples  are  fil¬ 
tered  out  without  solving  the  ii  minimization.  By  this  two 
stage  reconstruction,  the  computational  cost  is  greatly  re¬ 
duced  and  the  proposed  tracker  runs  much  faster  than  our 
previous  LI  tracker  [18]. 

Our  second  contribution  is  an  occlusion  detection 
method  by  investigating  the  reconstruction  coefficients.  It 
is  then  used  to  improve  the  template  update  procedure. 
The  £i  minimization  based  reconstruction  used  in  previous 
method  [18]  is  known  to  be  capable  for  capturing  occlusion 
information,  which  has  been  previously  used  for  face  recog¬ 
nition  [22].  While  this  property  enables  tracking  occluded 
targets,  it  also  induces  risks  by  introducing  the  occluded  tar¬ 
get  information  into  the  template  set  and  potentially  causing 
track  failures.  To  prevent  the  wrong  information  from  con¬ 
taminating  the  template  set,  we  introduce  a  robust  occlusion 
detection  method.  The  idea  is  to  first  build  an  occlusion  map 
from  the  trivial  coefficients,  which  indicate  pixel-wise  im¬ 
age  contamination  in  a  given  candidate.  The  occlusion  map 
is  then  used  for  occlusion  detection  and  a  candidate  is  not 
added  to  the  template  set  if  an  occlusion  is  detected. 

For  evaluation,  the  proposed  BPR-Ll  tracker  is  tested  on 
several  challenging  benchmark  sequences  involving  chal¬ 
lenges  such  as  occlusion  and  illumination  changes.  In  all 
the  experiments,  our  BPR-Ll  method  shows  excellent  per¬ 
formance  in  comparison  with  previously  proposed  trackers. 

2.  Related  Work 

Due  to  the  extensive  literature  on  visual  tracking,  we  re¬ 
view  only  typical  works  and  refer  interested  readers  to  [25] 
for  a  thorough  review.  The  visual  tracking  problem  can  be 
formulated  in  two  different  categories:  generative  and  dis¬ 
criminative.  Generative  tracking  methods  use  an  appear¬ 
ance  model  to  represent  the  target  observations.  Tracking 
is  formulated  as  searching  the  target  location  that  has  the 
most  similar  appearance  to  the  model.  Examples  of  gen¬ 
erative  tracking  methods  are  eigentracker  [3],  mean  shift 
tracker  [  ],  incremental  tracker  [20],  and  covariance  tracker 
[19].  In  [20],  a  tracking  method  is  presented  that  incremen¬ 
tally  learns  a  low-dimensional  subspace  representation,  and 
efficiently  adapts  to  online  changes  in  the  target  appearance. 
To  adapt  to  the  target  appearance  variations  due  to  the  illu¬ 
mination  changes,  pose  changes,  etc.,  the  appearance  model 
is  often  dynamically  updated. 

Discriminative  tracking  methods  cast  the  tracking  as  a 
binary  classification  problem.  Tracking  is  formulated  as 
finding  the  target  location  that  can  best  separate  the  target 
from  the  background.  In  [  ],  a  feature  vector  is  constructed 
for  every  pixel  in  the  reference  image  and  an  adaptive  en¬ 
semble  of  classifiers  is  trained  to  separate  pixels  that  belong 
to  the  object  from  pixels  that  belong  to  the  background.  In 
[6],  a  confidence  map  is  built  by  finding  the  most  discrim¬ 
inative  RGB  color  combination  in  each  frame.  A  hybrid 


approach  that  combines  a  generative  model  and  a  discrimi¬ 
native  classifier  is  used  to  capture  appearance  changes  and 
allow  reacquisition  of  an  object  after  total  occlusion  [27]. 
Online  multiple  instance  learning  is  used  in  [2]  to  achieve 
robustness  to  occlusions  as  well  as  other  image  corruptions. 
Global  mode  seeking  is  used  to  detect  the  object  after  to¬ 
tal  occlusion  and  reinitialize  the  local  tracker  [26].  An¬ 
other  example  [4]  uses  image  fusion  to  determine  the  best 
appearance  model  for  discrimination  and  then  a  generative 
approach  for  dynamic  target  updates. 

Particle  filter  (PF)  has  been  introduced  for  visual  track¬ 
ing  [10]  and  has  been  a  popular  framework  due  to  excel¬ 
lent  performance  for  nonlinear  target  motion  and  fiexibility 
to  different  object  representations.  While  the  use  of  more 
particle  samples  can  improve  track  robustness,  the  compu¬ 
tational  load  required  by  the  particle  filter  also  increases. 
Some  authors  have  proposed  to  speed  up  the  particle  filter 
framework.  In  [24],  the  observation  likelihood  based  on 
multiple  features  is  computed  in  a  coarse-to-fine  manner  so 
that  the  computation  can  quickly  focus  on  the  more  promis¬ 
ing  regions.  In  [11],  an  efficient  method  is  introduced  for 
using  subspace  representations  in  a  particle  filter  by  ap¬ 
plying  Rao-Blackwellization  to  integrate  out  the  subspace 
coefficients  in  the  state  vector.  Fewer  samples  are  needed 
since  part  of  the  posterior  over  the  state  vector  is  analyti¬ 
cally  calculated.  In  [28],  it  adjusts  the  number  of  particle 
samples  based  on  the  noise  variance. 

Sparse  representation  has  recently  been  introduced  for 
tracking  in  [18]  and  later  exploited  in  [15].  In  [18],  a  track¬ 
ing  candidate  is  sparsely  represented  as  a  linear  combina¬ 
tion  of  target  templates  and  trivial  templates  that  only  have 
one  nonzero  element  in  each  of  them.  The  sparse  represen¬ 
tation  problem  is  solved  through  an  minimization  prob¬ 
lem  with  non-negativity  constraints  to  solve  the  inverse  in¬ 
tensity  pattern  problem  during  tracking.  In  [15]  the  group 
sparsity  is  integrated  and  very  high  dimensional  image  fea¬ 
tures  are  used  for  improving  tracking  robustness.  Our  work 
is  inspired  by  these  studies,  but  we  use  an  £2  error  bound 
to  improve  efficiency  and  introduce  an  occlusion  map  for 
reliable  template  updating. 

Our  work  shares  philosophies  with  works  where  the  er¬ 
ror  bound  is  used  to  guide  visual  tracking.  For  example, 
in  [16]  a  boosting  error  bound  in  a  co-training  framework  is 
used  to  guide  the  novel  tracker  construction.  However,  the 
application  of  using  an  error  bound  in  a  sparse  tracker  has 
not  been  well  explored.  Furthermore,  our  goal  is  to  use  the 
error  bound  for  speed  up  without  sacrificing  accuracy. 

3.  Efficient  Ll-Tracker  with  Bounded  Particle 
Resampling 

In  this  section  we  will  first  review  the  original  Ll- 
Tracker  [18]  that  combines  the  sparse  representation  and 


the  particle  filter  framework.  Then  we  will  present  the  least 
squares  based  minimum  error  bound,  which  can  be  more 
efficiently  computed  than  the  error  bound  used  in  the  Ll- 
Tracker.  After  that,  we  propose  the  BPR-Ll,  using  the  BPR 
procedure  to  increase  efficiency  without  accuracy  loss. 

3.1.  Ll-Tracker  with  Sparse  Representation 

Particle  filter.  The  Ll-Tracker  proposed  in  [18]  is  formu¬ 
lated  as  finding  a  sparse  representation  in  the  template  sub¬ 
space.  The  representation  is  then  used  with  the  particle 
filter  framework  [10]  for  visual  tracking.  Specifically,  for 
frame  at  time  t,  we  denote  Xf  as  the  state  variable  describ¬ 
ing  the  location  and  shape  of  a  target.  The  tracking  problem 
can  be  formulated  as  an  estimation  of  the  state  probabil¬ 
ity  p{xt\zi.t),  where  zi-t  =  {zi,  Z2,  •  •  •  ,  z^}  represents  the 
observations  from  previous  t  frames.  The  tracking  proceeds 
using  a  two-stage  Bayesian  sequential  estimation,  which  re¬ 
cursively  updates  the  filtering  distribution  as 

P(Xt|zi:t_i)  =  j  p{Xt\Xt-l)p{Xt-l\zi.,t-l)dXt-l  ,  (1) 

p{xt\zi,t)  OC  p(Z(|X()p(Xt|zi:(_i)  ,  (2) 

where  p{xt\xt-i)  indicates  the  state  transition  probability, 
and  p{zt\xt)  gives  the  observation  likelihood  of  state  Xf. 
Direct  calculation  of  the  above  distribution  is  practically  in¬ 
tractable.  Alternatively,  the  posterior  p(xt|zi:t)  is  approxi¬ 
mated  by  a  set  of  N  particle  samples  with  impor¬ 

tance  weights  ttJ  .  The  samples  are  updated  and  resampled 
at  each  frame. 

In  the  Ll-Tracker,  the  state  variable  Xf  contains  six  pa¬ 
rameters  of  the  affine  transformation.  The  state  transi¬ 
tion  of  Xt  are  modeled  independently  by  a  Gaussian  dis¬ 
tribution  around  the  previous  state  x^-i,  and  N  candidate 
samples  are  generated  based  on  the  state  transition  model 
p{xt\xt_i).  The  observation  model  p{zt\xt)  refiects  the 
similarity  between  a  target  candidate  and  the  target  tem¬ 
plates,  which  is  formulated  using  approximation  error  in  the 
sparse  representation  described  as  below. 

Sparse  representation.  To  model  the  observation  likeli¬ 
hood  p{zt\xt),  a  patch  corresponding  to  state  Xt  is  first 
cropped  from  frame  Zt  ^ .  The  patch  is  then  normalized  and 
reshaped  to  a  ID  vector  y,  which  is  used  as  a  target  candi¬ 
date. 

The  sparse  representation  of  y  is  formulated  as  a  min¬ 
imum  error  reconstruction  through  a  regularized  ii  mini¬ 
mization  function  with  nonnegativity  constraints 

min  II  Bc-y  II2 +A  II  c  111  ,  s.t.  c  0  ,  (3) 

C 

where  B  =  [T,  I,  — l]  is  composed  of  target  template  set 
T  and  trivial  template  sets  I  and  —I.  Each  column  in  T  is  a 
target  template  generated  by  reshaping  pixels  of  a  candidate 

^The  frame  at  time  t  is  treated  as  the  observation  zt. 


Algorithm  1  Particle  filter  for  Ll-Tracker 
1:  Ait  =  0,  initialize  samples  Xq,  for  i  =  1,  2,  •  •  • 

2:  for  t  =  1  to  number  of  frames  do 

3:  for  i  =  1  to  number  of  samples  do 

4:  Draw  sample  xl  with  respect  to  p{xt  |xt- 1 ) 

5:  Prepare  the  observation  yj  from  xj. 

6:  Calculate  the  observation  probability  p(yj  | xj ) . 

7:  Resample  with  respect  to  p(yj|xj),  the  number  of 

times  that  xj  appears  in  the  new  set  is  N  ^p{yl  |xj) . 
8:  end  for 

9:  end  for 


patch  into  a  column  vector;  and  each  column  in  the  trivial 
template  sets  is  a  unit  vector  that  has  only  one  nonzero  el¬ 
ement.  c  =  [a^,  e^]  is  composed  of  target  coefficients  a 
and  trivial  coefficients  e  respectively. 

Finally,  the  observation  likelihood  is  derived  from  the 
reconstruction  error  of  y  as 

p(zt|xt)  =  ^exp{-a  II  Ta-y  f }  ,  (4) 

where  a  is  obtained  by  solving  the  £i  minimization  (3),  a  is 
a  constant  controlling  the  shape  of  the  Gaussian  kernel,  and 
r  is  a  normalization  factor. 

For  tracking  at  time  t,  the  candidate  with  the  maximum 
observation  likelihood  is  chosen  as  the  tracking  result.  The 
likelihood  also  serves  for  sample  weights  and  resampling  in 
the  particle  filter.  A  summary  of  the  particle  filter  for  Ll- 
Tracker  is  given  in  Algorithm  1 . 

3.2.  Minimum  Error  Bound 

In  this  section,  we  will  show  that  the  reconstruction  error 
from  the  target  templates  in  £2  norm  is  bounded  by  a  min¬ 
imum  error  that  can  be  calculated  much  faster  than  solving 
an  minimization  function. 

Least  squares  error  bound.  The  observation  likelihood 
defined  in  (4)  builds  on  the  reconstruction  error  ||  Ta  —  y  |p 
measured  in  the  £2  norm.  Since  a  is  calculated  by  the  £i 
minimization  (3),  there  is  a  natural  lower  bound  for  the  re¬ 
construction  error 

||Ta-y||2>||Ta-y||2  ,  (5) 

where 

a  =  argmin  ||  Tb  —  y  |p  (6) 

b 

is  the  linear  least  approximation  of  y  in  the  subspace 
spanned  by  T.  One  can  also  view  a  as  a  degenerated  case 

of  a  when  A  =  0.  Similarly,  for  the  observation  likelihood 
p{zt\xt),  we  derive  its  upper  bound  q{zt\xt)  using  the  least 
squares  approximation  error 

9(z«|xt)  =  Y  exp{-a  ||  Ta  -  y  f}  , 


(7) 


where  a  and  F  are  the  same  as  in  (4).  We  immediately  have 

p(zt|xt)  <  9(zt|xt)  .  (8) 

Efficiency  analysis.  The  linear  system  in  (6)  can  be  solved 
by  Cholesky  factorization  or  QR  factorization.  For  dense 
matrices,  the  cost  of  the  Cholesky  factorization  method 
is  dn^  +  (l/3)n^,  while  the  cost  of  the  QR  factorization 
method  is  2dn^,  where  d  is  the  image  dimension  and  n  is 
the  number  of  templates.  The  QR  factorization  method  is 
slower  by  a  factor  of  at  most  2  if  d  n,  which  is  the  case 
for  our  problem.  For  small  and  medium- size  problems,  the 
factor  of  two  does  not  outweigh  the  difference  in  accuracy, 
and  the  QR  factorization  is  the  recommended  method. 

The  original  LI -Tracker  uses  the  preconditioned  conju¬ 
gate  gradients  (PCG)  method  [12]  to  solve  the  minimiza¬ 
tion  function.  The  PCG  algorithm  computes  the  search  di¬ 
rection  and  the  run  time  is  determined  by  the  product  of  the 
total  number  of  PCG  steps  required  over  all  iterations  and 
the  cost  of  a  PCG  step.  The  total  number  of  PCG  iterations 
required  by  the  truncated  Newton  interior-point  method  de¬ 
pends  on  the  value  of  the  regularization  parameter  A.  In  the 
experiments,  we  found  that  the  total  number  of  PCG  is  a 
few  hundred.  The  computationally  most  expensive  opera¬ 
tion  for  a  PCG  step  is  a  matrix-vector  product  which  has 
0{d{2d  -j-  n))  =  0{d‘^  +  dn)  computing  complexity. 

From  the  complexity  analysis,  we  can  see  the  solution  to 
the  least  squares  problem  in  (6)  is  two  orders  faster  than  the 
ii  solution.  For  example,  if  we  are  using  template  size  of 
15  X  12,  then  d  =  15  x  12  =  180.  The  number  of  templates 
is  n  =  10.  The  cost  of  Cholesky  factorization  method  is 
dn2  +  (l/3)n3  =  180  x  100  +  (1/3)  x  1000  18000. 

While  the  cost  of  a  PCG  step  is  0{(P  +  dn)  =  0(32400), 
and  there  will  be  a  few  hundred  such  steps. 

3.3.  Bounded  Particle  Resampling 

From  the  previous  section,  we  showed  that  the  computa¬ 
tion  is  much  more  intensive  to  compute  the  observation  like¬ 
lihood  p{zt\xt)  than  to  compute  its  upper  bound  q{zt\xt).  It 
is  therefore  attractive  to  use  the  upper  bound  for  samples 
that  are  not  promising  enough  and  only  conduct  the  true 
likelihood  computations  for  the  promising  samples.  In  ad¬ 
dition,  to  search  for  the  candidate  with  the  maximum  likeli¬ 
hood,  we  still  need  the  sample  observations  for  resampling. 
Fortunately,  in  many  cases  only  a  small  portion  of  the  sam¬ 
ples  will  be  preserved  after  resampling  and  an  efficient  al¬ 
gorithm  can  be  designed  to  avoid  computing  all  observation 
likelihoods. 

We  denote  4^  =  {x^,  x|,  •  •  •  ,  x^}  as  the  sample  set  at 
time  t,  and  denote  =  p(zt|xj),  =  g(zt|xj)  as  the  obser¬ 
vation  likelihood  and  its  corresponding  upper  bound  defined 
in  previous  subsection.  At  the  end  of  each  frame,  the  sam¬ 
ples  are  resampled  with  respect  to  their  observation  likeli¬ 
hoods.  We  have  the  following  observation. 


Algorithm  2  Two- stage  Bounded  Resampling 

1:  Input:  sample  set  44- 1  = 

2:  Output:  sample  set  44  = 

3:  /*  Stage  1*/ 

4:  for  i  =  1  to  do 
5:  Draw  sample  xj  from  xj_^ 

6:  Prepare  the  sample  appearance  yj  from  xj 

7:  Solve  the  linear  least  squares  problem  (6)  for  yj 

8:  Compute  Qi  according  to  (7) 

9:  end  for 

10:  Sort  samples  in  descending  order  according  to 
11:  /*  Stage  2*/ 

12:  i  1,  Ti  ^  0 

13:  while  Qi  >  Ti  and  i  <  N  dio 

14:  Solve  the  minimization  problem  (3)  for  y\ 

15:  Compute  Pi  according  to  (4) 

16:  Ti+i  ^Ti+  Pi/ {2N  -  1) 

17:  i  <—  Z  +  1 

18:  end  while 
19:  Pj  ^  0,  Vj  >  i 
20:  /^Resampling*/ 

21:  44  ^  Resample  with  respect  to  {pk}k=i 


Observation  1.  If  the  sample  appears  at  least  once  in  the 
resampled  set,  its  observation  probability  must  satisfy  the 
following  condition 

1  ^ 
i=i 

The  observation  is  straightforward  because  the  number 
of  samples  remains  unchanged  before  and  after  resampling. 
The  “2”  in  the  denominator  is  due  to  rounding. 

Motivated  by  the  observation,  we  develop  a  two-stage 
bounded  resampling  algorithm  to  calculate  the  probability 
of  the  tracking  candidates.  The  first  stage  is  very  straightfor¬ 
ward:  we  compute  the  probability  bounds  g^  for  all  samples 
and  sort  them  in  descending  order.  Without  loss  of  gener¬ 
ality,  in  the  following  we  assume  the  samples  are  already 
sorted,  i.e.,  gi  >  g2  >  •  •  •  >  gAr. 

In  the  second  stage,  our  task  is  to  calculate  the  observa¬ 
tion  probability  pi  for  samples  that  will  survive  resampling. 
The  observation  probability  can  be  done  efficiently  even  for 
large  number  of  samples  by  using  a  dynamically  updated 
threshold  r  to  exclude  to-be-discarded  samples.  In  particu¬ 
lar,  r  is  defined  according  to  the  following  theorem: 


Theorem  1.  If  the  sample  Xi  appears  at  least  once  after 
resampling,  its  likelihood  bound  is  no  less  than  a  thresh¬ 
old  Ti  defined  as 


Ti  = 


1 


2N  -I 


i=i 


(10) 


Proof.  From  Observation  1,  we  have 

N  i 

‘2Npi  >'^pj>  '^Pj  .  (11) 

j  =  l  3  =  1 

Subtract  pi  on  both  sides  and  divide  by  2N  —  1,  we  have 
1 

2N-1 

i=i 

Using  the  fact  that  qi  is  an  upper  bound  of  pi,  we  have 

qi>Pi>  n.  □ 

From  the  definition,  we  see  that  the  thresholds  are  non¬ 
decreasing,  i.e.,  0  =  Ti  <  T2  <  •  •  •  <  Tat,  and  there  is 

which  can  be  used  for  an  efficient  threshold  update. 

With  the  above  thresholds,  in  the  second  stage  we  start 
with  the  first  sample  that  has  the  largest  likelihood  bound 
Qi,  and  calculate  the  probability  pi  according  to  (4)  and 
update  the  corresponding  threshold  T2  according  to  (13). 
Then  we  continue  for  samples  2,  3,  ....  For  the  sam¬ 
ple,  if  the  likelihood  bound  >  Ti,  we  compute  the  ob¬ 
servation  likelihood  pi  and  update  threshold  r^+i.  Other¬ 
wise  if  qi  <  Ti,  which  according  to  Theorem  1  implies 
that  x^,  Xi+i,  •  •  •  ,  xat  will  be  discarded  during  resampling. 
Then,  we  directly  set  Pi  =  =  •  •  •  =  Pn  =  0.  The 

probabilities  pi,P2,  *  •  *  ? Pat  is  then  used  for  resampling  set 
T'.  The  proposed  two  stage  bounded  resampling  is  summa¬ 
rized  in  Algorithm  2. 

The  above  Bounded  Particle  Resampling  (BPR)  proce¬ 
dure  does  not  sacrifice  resampling  precision,  which  is  guar¬ 
anteed  by  Theorem  1 .  BPR  avoids  the  expensive  computa¬ 
tion  on  samples  with  low  likelihoods.  The  amount  of  time 
saved  is  mainly  determined  by  the  dissimilarity  between  the 
tracking  foreground  and  its  surrounding  background.  In¬ 
tuitively,  the  larger  the  foreground/background  difference, 
the  more  speedup  from  the  BPR  procedure.  Furthermore, 
the  BPR  framework  encourages  using  more  particles  with 
larger  sampling  variance  in  comparison  with  the  previously 
proposed  LI -Tracker  [18].  BPR  helps  improve  tracking  ac¬ 
curacy  in  addition  to  computational  efficiency. 

Empirical  studies.  In  Fig.  1,  it  shows  the  curves  of  p,  q,  and 
r  calculated  from  one  frame  when  600  particles  are  used. 
After  about  100  particle  samples,  q  is  becoming  smaller 
than  r,  and  the  probability  p  of  the  rest  samples  are  as¬ 
signed  to  0  without  calculating  the  computational  expensive 
ii  minimization.  For  this  frame,  only  20%  of  the  samples 
need  to  solve  the  £i  minimization,  and  we  achieve  about  5 
times  speed  up  for  this  frame. 

Fig.  2  shows  the  run  time  and  the  number  of  particle  sam¬ 
ples  for  which  the  ii  minimization  is  performed.  We  can 
see  that  the  run  time  is  proportional  to  the  number  of  the 


the  data  for  display  purpose. 
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Figure  2.  The  run  time  and  number  of  particle  samples  for  se¬ 
quence  OneLeaveShopReenter2cor. 

samples  and  is  dominated  by  the  second  stage  probability 
calculation.  For  most  of  the  frames  in  the  sequence,  only 
20%  of  the  particle  samples  calculate  the  ii  minimization. 
From  frame  190  to  230,  the  man  comes  out  of  the  shop  and 
the  woman  is  partially  occluded.  We  clearly  see  that  the 
number  of  particles  calculating  for  the  £i  minimization  in¬ 
creases  dramatically  when  the  target  is  occluded.  When  the 
target  is  occluded,  none  of  the  samples  can  model  the  target 
appearance  well  enough  so  the  probabilities  are  distributed 
between  the  particles.  The  more  probabilities  concentrate 
on  the  first  few  samples,  the  less  particles  are  needed  for 
the  £i  minimization  calculation.  When  the  probabilities  are 
evenly  distributed,  more  particles  are  needed,  which  results 
in  a  longer  run  time  for  the  frame. 

4.  Occlusion  Detection 

The  template  set  needs  to  be  updated  to  capture  the  ap¬ 
pearance  variations  of  the  target  during  tracking.  In  [18], 
the  tracking  result  is  added  to  the  template  set  if  none  of 
the  template  is  similar  to  the  tracking  result.  Therefore, 
the  tracker  is  vulnerable  to  failures  when  the  tracking  re- 


suit  with  a  large  occlusion  is  added  to  the  template  set.  To 
prevent  an  improper  addition  to  the  template  set,  we  pro¬ 
pose  a  method  to  detect  the  large  occlusion  in  the  tracking 
result  before  it  is  added  to  the  template  set. 

For  occlusion  detection,  we  investigate  the  responses  in 
the  trivial  templates  when  solving  the  minimization  (3). 
The  trivial  templates  are  activated  when  the  pixel  intensity 
can  not  be  approximated  well  using  the  target  templates. 
Therefore,  we  explore  the  trivial  template  coefficients  for 
the  occlusion  detection.  We  convert  the  ID  trivial  coef¬ 
ficient  vector  to  a  2D  trivial  image  by  reversing  the  way 
that  the  target  template  is  vectorized.  Each  pixel  in  trivial 
image  is  mapped  to  the  pixel  in  the  same  location  in  the 
template  image.  We  threshold  the  trivial  image  and  obtain 
another  2D  binary  image  that  we  call  occlusion  map.  The 
white  pixel  in  the  occlusion  map  indicates  that  the  pixel  is 
occluded  and  the  black  pixel  indicates  no  occlusion.  We  as¬ 
sume  that  an  occluder  is  large  in  size  and  the  intensity  is 
different  enough  to  be  separated  from  small  random  noises. 
Therefore,  the  occlusion  is  a  large  connected  region  in  the 
occlusion  map.  The  occlusion  detection  is  then  reduced  to 
find  a  white  area  that  is  large  enough  to  be  classified  as  an 
occlusion.  After  applying  morphological  operations  to  the 
occlusion  map  to  remove  the  small  areas  and  fill  the  small 
hole  between  the  regions,  we  count  the  number  of  pixels  in 
the  largest  region.  If  the  area  is  larger  than  a  pre-defined 
threshold,  say  30%  of  the  area  of  the  occlusion  map,  we 
conclude  that  there  is  an  occlusion  in  the  tracking  result, 
and  the  template  set  should  not  be  updated. 

Normally  when  an  occlusion  is  detected,  it  will  not  go 
away  for  a  certain  period  of  time.  For  example,  when  the 
target  is  occluded  by  an  object,  and  the  object  is  moving 
away  from  the  target,  the  occlusion  is  becoming  smaller  be¬ 
fore  it  goes  away.  In  our  occlusion  detection  method,  we 
avoid  updating  the  template  set  for  the  next  5  frames  after 
an  occlusion  is  detected. 

5.  Experiments 

We  implemented  the  proposed  approach  in  MATLAB 
with  the  SPAM  package^  [17]  and  evaluated  the  perfor¬ 
mance  on  seven  publicly  available  video  sequences.^  Our 
proposed  tracker  is  compared  with  seven  latest  state-of- 
the-art  trackers  named  Incremental  Visual  Tracking  (IVT) 
[20],  Multiple  Instance  Learning  (MIL)  [2],  Visual  Track¬ 
ing  Decomposition  (VTD)  [13],  Generalized  Kernel  Track¬ 
ing  (GKT)  [21],  LI  tracker  (LI)  [18],  Covariance  Tensor 
Learning  (CTL)  [23],  and  Online  AdaBoost  (OAB)  [9].  The 

^http://www.di.ens.fr/willow/SPAMS/downloads.html 

^Sequences  1-3  were  from  http://www.cs.toronto.edu/~dross/ivt/. 
Sequences  4-7  were  from  the  PETS  2001  dataset 
http://www.cvg.cs.rdg.ac.uk/  PETS2001/,  http://groups.inf.ed.ac.uk/vision 
/  CAVIAR/C  AVI  ARDATA 1  /,  http  ://vision .  Stanford.  edu/~birch/headtracker 
/seq/,  and  http://vision.ucsd.edu/~bbabenko/project_miltrack.shtml. 


tracking  results  of  the  compared  methods  were  obtained  by 
running  the  source  code  or  binaries  provided  by  their  au¬ 
thors  using  the  same  initial  positions  in  the  first  frame. 

5.1.  Qualitative  Comparison 

The  first  sequence  shows  a  vehicle  undergoes  drastic  il¬ 
lumination  changes  as  it  passes  beneath  a  bridge  and  under 
trees.  Tracking  results  on  several  frames  are  presented  in 
Fig.  3  (A).  The  BPR-Ll  tracker,  LI  tracker,  IVT  and  CTL 
are  able  to  track  the  target  well  even  though  the  drastic  il¬ 
lumination  changes,  while  the  other  trackers  lose  the  target 
after  it  goes  through  the  bridge. 

The  second  sequence  was  captured  in  an  indoor  envi¬ 
ronment.  Results  on  several  frames  are  presented  in  Fig.  3 
(B).  The  BPR-Ll  tracker,  LI  tracker,  IVT,  OAB,  and  MIL 
tracks  the  target  faithfully  throughout  the  sequences.  The 
other  trackers  fails  track  the  target  when  there  are  both  pose 
and  illumination  changes. 

Results  on  the  third  sequence  are  shown  in  Fig.  3  (C).  It 
shows  a  moving  animal  doll  and  presents  challenging  pose, 
lighting,  and  scale  changes.  The  LI  tracker,  IVT,  and  VTD 
eventually  fails  in  frame  736  as  a  result  of  a  combination  of 
drastic  pose  and  illumination  change.  The  BPR-Ll  tracker 
and  rest  trackers  are  able  to  track  the  target  for  this  long 
sequence,  though  GKT  is  a  little  off  the  target  in  frames 
521,  and  546. 

In  the  fourth  sequence,  a  person  is  walking  from  right 
bottom  corner  to  the  left  of  the  image  (Fig.  3  (D)).  The  IVT 
fails  to  track  the  target  from  the  start.  The  VTD  starts  to 
show  some  target  drifting  around  200  frames,  and  finally 
loses  the  target.  Our  tracker  and  the  rest  trackers  success¬ 
fully  track  the  target. 

The  fifth  sequence  is  to  track  a  walking  woman.  In  this 
video,  the  background  color  is  similar  to  the  color  of  the 
woman’s  trousers,  and  the  man’s  shirt  and  pants  have  a  sim¬ 
ilar  color  to  the  woman’s  coat.  In  addition,  the  woman  un¬ 
dergoes  partial  occlusion.  Some  result  frames  are  given  in 
Fig.  3  (E).  Only  the  BPR-Ll  tracker  and  LI  tracker  are  able 
to  track  the  target  during  the  entire  sequence.  The  other 
trackers  lock  on  the  man  when  he  occludes  the  woman  after 
he  comes  out  of  the  shop. 

Results  of  the  sixth  sequence  are  shown  in  Fig.  3  (F).  In 
this  sequence,  we  show  the  robustness  of  our  algorithm  in 
handling  occlusion  and  large  pose  change.  All  the  trackers 
track  the  target  in  the  entire  sequence  except  for  the  MIL 
that  loses  the  target  in  the  frame  466. 

Results  on  the  seventh  sequence  are  shown  in  Fig.  3  (G). 
Many  trackers  start  drifting  from  the  target  when  the  man’s 
face  is  almost  fully  occluded  by  the  book.  The  BPR-Ll 
tracker  handles  this  very  well  and  continues  tracking  the 
target  when  the  occlusion  disappears. 


(A)  Car4  (23,  185,  210,  271,  306,  599) 


(B)  David  Indoor  (318,  411,  442,  463,  595,  677) 


(D)  PETSOlDlHumanl  (1417,  1496,  1562,  1631,  1744,  1821) 


(G)  Occluded  Face  2  (43,  152,  354,  473,  706,  786) 
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Figure  3.  Tracking  results  of  different  algorithms.  Frame  numbers  are  listed  after  sequence  names. 


5.2.  Quantitative  Comparison 

To  quantitatively  compare  robustness  under  challenging 
conditions,  we  manually  labeled  the  ground  truth  of  the 
seven  sequences.  The  tracking  error  evaluation  is  based  on 
the  relative  position  errors  (in  pixels)  between  the  center  of 
the  tracking  result  and  that  of  the  ground  truth.  Ideally,  the 
position  differences  should  be  around  zero. 

As  shown  in  Fig.  4,  the  position  difference  results  of  the 
BPR-Ll  tracker  are  much  smaller  than  those  of  the  other 
trackers.  Using  the  BPR  method,  the  BPR-Ll  tracker  ac¬ 
counted  for  occlusion  errors  and  better  utilized  particle  re¬ 


sampling  for  computational  efficiency  and  track  accuracy. 

6.  Conclusion 

We  propose  an  efficient  BPR-Ll  tracker  with  minimum 
error  bound  and  occlusion  detection.  We  employ  a  two- 
stage  sample  probability  scheme,  where  most  samples  with 
small  probabilities  from  first  stage  are  filtered  out  without 
solving  the  computational  expensive  minimization.  Our 
occlusion  detection  coupled  with  a  template  update  scheme 
effectively  prevents  the  tracking  result  with  a  heavy  occlu¬ 
sion  from  adding  that  tracking  result  to  the  target  template 
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Figure  4.  Quantitative  comparison  of  the  trackers  in  terms  of  position  errors  (in  pixel). 


set.  Preventing  an  incorrect  update  to  the  target  template 
set  reduces  track  failure.  Our  proposed  BPR-Ll  method  is 
computational  more  efficient  than  the  previous  LI  trackers, 
and  demonstrates  the  effectiveness  in  handling  a  number  of 
challenging  sequences.  We  compared  the  BPR-Ll  tracker 
with  seven  other  state-of-the-art  trackers  including  the  orig¬ 
inal  LI  tracker  on  seven  sequences  to  validate  robustness. 
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